PartJoin: An Efficient Storage and Query Execution for Data Warehouses
نویسندگان
چکیده
The performance of OLAP queries can be improved drastically if the warehouse data is properly selected and indexed. The problems of selecting and materializing views and indexing data have been studied extensively in the data warehousing environment. On the other hand, data partitioning can also greatly increase the performance of queries. Data partitioning has advantage over data selection and indexing since the former one does not require additional storage requirement. In this paper, we show that it is beneficial to integrate the data partitioning and indexing (join indexes) techniques for improving the performance of data warehousing queries. We present a data warehouse tuning strategy, called PartJoin, that decomposes the fact and dimension tables of a star schema and then selects join indexes. This solution takes advantage of these two techniques, i.e., data partitioning and indexing. Finally, we present the results of an experimental evaluation that demonstrates the effectiveness of our strategy in reducing the query processing cost and providing an economical utilisation of the storage space.
منابع مشابه
Grid Services for Efficient Decentralized Indexation and Query Execution on Distributed Data Warehouses
Online analysis of large volumes of data consists an important part of today's business and scientific applications. Storing structured data in data warehouses following a multidimensional storage model provides efficient and reliable access to it. Distributed systems have become the first choice to cope with the increasing volume and complexity of data warehouses. In order keep up with this tr...
متن کاملFuzzy multi-criteria selection procedures in choosing data source
Technology assessment and selection has a substantial impact on organizations procedures in regards to technology transfer. Technological decisions are usually made by a group of experts, and whereby integrity of these viewpoints to a single decision can be quite complex. Today, operational databases and data warehouses exist to manage and organize data with specific features and henceforth, th...
متن کاملOptimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses
In this paper, we present storage structures, PK-map and Tuple-index-map, to improve the performance of query execution and inter-node communication in Cloud Data Warehouses. Cloud Data Warehouses require Read-Optimized databases because large amount of historical data are integrated on a regular basis to facilitate analytical applications for report generation, future analysis, and decision-ma...
متن کاملA Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses
Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention, because the physical plans—unaware of each other—compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex ...
متن کاملS4: A New Secure Scheme for Enforcing Privacy in Cloud Data Warehouses
Outsourcing data into the cloud becomes popular thanks to the pay-as-you-go paradigm. However, such practice raises privacy concerns. The conventional way to achieve data privacy is to encrypt sensitive data before outsourcing. When data are encrypted, a tradeoff must be achieved between security and efficient query processing. Existing solutions that adopt multiple encryption schemes induce a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002